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Test  Correlates  of  Air  Force  Weather  Forecaster  Proficiency 

ABSTRACT 

Results  are  presented  from  the  administration  of  a  battery  of 
twenty-two  tests  to  76  Air  Force  weather  forecasters  who 
constituted  criterion  groups  of  "good'1  and  "roor"  forecasters  selected 
by  the  use  of  the  nominating  technique.  Five  potential  predictors  were 
idcnti fied. 
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INTRODUCTION* 


I  . 

This  may  be  the  concluding  report  in  a  research  program  designed  to 
develop  evaluation  and  selection  instruments  for  Air  Force  weather 
forecasters.  Previous  reports  have  (a)  described  the  overall  program  (5;7), 

(b)  identified  AF  weather  forecaster  proficiency  characteristics  (4), 

(c)  described  the  development  of  the  criterion  data  and  two 
evaluation  instruments  (5),  and  (d)  determined  significant  differential 
education,  training,  experience  and  age  characteristics  between  good 
and  poor  weather  forecasters  (6) . 

Reports  of  successful  identification  or  development  of  predictor- 
tests  for  high  level  occupations  are  sparse.  In  addition  to  the 
problems  of  securing  a  large  enough  sample  for  which  common  criteria  are 
applicable,  there  is  the  problem  of  securing  cooperative  subjects. 

The  present  study  marks  the  second  attempt  to  develop  predictors  for 
Air  Force  weather  forecasters  (3). 

In  the  prior  study  in  1948  Jenkins  secured  data  on  the  following 
variables:  education,  college  major,  mathematics  background,  forecasting 

and  observing  experience,  kind  of  meteorological  training,  forecasting 
aids  most  frequently  used,  speed  and  accuracy  of  perception,  spatial 
relations  ability,  general  academic  ability,  and  vocational  interests. 
Information  on  the  initially  listed  six  variables  was  gathered  by 
questionnaire.  The  last  named  four  variables  were  measured, 
respectively,  by  the  Minnesota  Clerical  Test,  the  Revised  Minnesota 
Paper  Form  Board,  the  Ohio  State  University  Psychological  Test,  and 
the  Strong  Vocational  Interest  Blank  for  Men. 

*  A  condensation  of  this  report  was  presented  under  the  same  title 
at  the  American  Psychological  Association  meetings  in  Los  Angeles  in 
September  1964. 


Jenkins  concluded  that  Air  Force  weather  forecasters  were  a  highly 
select  group  as  to  educational  background,  and  as  to  their  clerical*, 
spatial  relations,  and  general  academic  abilities.  Only  the  Names 
section  of  the  Minnesota  Clerical  Test  proved  to  be  a  consistent 
predictor  of  forecasting  skill  with  a  correlation  of  .31.  Whereas 
Jenkins*  findings  depended  on  correlations  with  a  short-range 
forecast  verification  score  as  the  criterion,  the  results  developed 
in  this  study  are  based  upon  criterion  ratings  by  colleagues  who  worked 
with  ratccs  as  a  forecaster  for  over  three  months.  The  analyses  culminating 
in  the  reports  referenced  above,  in  conjunction  with  Jenkins*  findings, 
gave  rise  to  the  following  decisions: 

1.  Another  attempt  to  find  test  correlates  of  weather  forecasting 
proficiency  was  warranted  because  of  the  high  quality  of  the  criterion 
data  in  the  present  study.  Evidence  for  the  reliability  of  the  criterion 
ratings  was  indicated  by  the  finding  that  only  4%  of  over  22,000 
ratings,  applicable  to  1605  officer  forecasters,  were  contradictory 

(5,  p.5  &  6).  Evidence  for  the  validity  of  the  ratings  was  found  in  the 
number  and  kinds  of  the  biographical  characteristics  for  which  significant 
differences  were  found  between  both  "good"  (high  criterion)  and  "poor" 

(low  criterion)  officer  weather  forecasters  and  ,,good**  and  "poor"  enlisted 
forecasters  (6). 

2.  Tests  of  a  high  order  of  difficulty  should  be  employed  in 
consideration  of  the  number  of  mean  scores  found  by  Jenkins  to  fall  at 
very  high  percentiles. 

*It  is  believed  more  appropriate  to  speak  of  this  ability  as  speed 
and  accuracy  of  perception. 
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3.  Tests  of  printer  pertinence  to  the  obi] flies  required  of 
weather  forecasters  were  needed  -  with  special  attention  to  spatial 
relations.  It  was  believed  that  the  proficiency  characteristics 
already  identified  (4)  provided  added  clues  for  the  selection  of  a 
trial  test  battery. 

4  Verification  of  the  criticalitv  of  perceptual  speed  and 
accuracy  as  found  by  Jenkins  with  the  Names  section  of  the  Minnesota 
Clerical  Test  should  he  sought. 

II.  Procedure 

A.  Selection  of  Trial  Test  Battery 

To  implement  the  aims  set  forth  above,  a  variety  of  Armed 
Forces  tests  wore  reviewed  and  one  public  and  three  private  test 
development  organizations  were  consulted.  Because  of  his  concentration 
on  the  study  of  high-level  aptitudes,  Professor  Guilford  was  consulted 
and  through  him  assistance  was  obtained  from  Dr.  Philip  R.  Merri field, 
who  had  experience  in  weather  forecasting.  Initially  26  tests  were 
included  in  the  trial  battery  which  took  2  days  to  administer:  however 
four  tests  were  dropped  from  the  battery  in  the  early  stages  which 
reduced  testing  time  to  a  day  and  a  half.  Tests  for  which  n  complete 
set  of  data  were  secured  are  listed  in  Table  I;  nineteen  of  them  are 
shown  by  major  classification  in  Guilford  and  Merrifield's  f,The  Structure 
of  Intellect  Model:  Its  Uses  and  Implications”,  April  l°6f\  where  brief 
descriptions  of  those  so  classified  can  be  found  (1).  Tests  dropped 
from  the  battery  are  listed  with  accompanying  reasoning  in  Appendix  C. 

Other  tests  included  in  the  trial  battery  were,  the  Minnesota  Clerical 
Test,  DuBois  and  Gleser's  Ob ject-aperture  Test,  and  items  requiring  cube 
matching  from  the  U.  S.  Civil  Service  Commission  (courtesy  of 
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Ernest  J.  Primoff).  For  each  test  item  the  Object-aperture  test  shows 
a  different  three-dimensional  object  which  is  accompanied  with  five 
different  two-dimensional  apertures  or  openings.  The  subject's  task 
is  to  select  the  opening  through  which  the  three-dimensional  object 
could  pass.  The  test  has  no  specified  time  limit.  The  Civil  Service 
Cubes  Test  consisted  of  twenty  items  which  presented  subjects  with 
two  separate  cubes;  one  face  of  one  cube  contained  two  holes  whereas 
one  face  of  the  other  had  two  pegs;  also  presented  were  four  ways 
in  which  the  two  cubes  had  been  joined.  Three  faces,  which  included 
either  holes  or  pegs,  of  the  single  cubes  were  shown  whereas  only 
two  faces  of  one  of  the  combined  cubes  and  three  of  the  other  were 
shown.  Each  cube  face  had  a  unique  design.  The  subject's  task  was  to 
select  the  proper  combination  which  could  be  formed  by  joining  the 
two  separate  cubes.  Subjects  were  allowed  35  minutes  for  this  task. 

B.  Selection  of  Subjects 

The  primary  ground  rules  for  the  selection  of  subjects  were 
the  same  as  used  in  the  selection  of  forecasters  for  the  biographical 
analysis,  namely:  "possessed  a  proficiency  index  of  1.33  and  above  or 
.90  and  below  as  developed  from  ratings  of  officers  with  whom  they  had 
worked  as  forecasters”  (6)*.  Second,  was  geographic  and  travel  fund 
availability.  Many  subjects  volunteered  -  more  were  secured  by  command. 

^Proficiency  indices  were  developed  by  scoring  two  points  for  an  above 
average  rating,  one  point  for  an  average  rating,  no  points  for  a  below 
average  rating  and  dividing  the  total  points  by  the  number  of  ratings. 
Added  criteria  for  inclusion  in  the  present  study  were  that  six  ratings 
be  available  for  each  subject  and  that  no  subject  be  included  in  the  low 
criterion  group  who  was  not  judged  below  average  at  least  twice.  It 
should  be  understood  that  when  words  such  as  "forecasting  proficiency"  and 
the  like  are  used  in  this  report  that  their  connotation  is  limited  to  * 
ratings  by  colleagues. 
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In  order  to  prevent  informal  labelling  of  the  forecasters  as  to  criterion 
status,  a  number  of  them  who  did  not  fall  within  either  the  high  and  low  * 
criterion  groups  specified  above  were  also  administered  the  complete 
trial  test  battery.  High  criterion  X=39;  I.ow  criterion  Ns37. 

The  criterion  status  of  each  of  the  76  officers  who  composed  the 
high  and  low  criterion  groups  is  presented  with  their  test  scores  in 
Appendix  A.  The  subjects  ranged  in  rank  from  captain  through  full 
colonel  with  ten  warrant  officers  included  within  the  groups;  and  the 
approximate  age  range  was  between  39  and  47. 

C.  Test  Administration 

Tests  were  administered  to  groups  as  large  as  twelve  and  to 
single  individuals.  The  administration  of  every  test  to  evo.rv  subject 
was  directly  supervised  and  monitored.  The  data  were  obtained  from 
July  1961  to  December  1963. 
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MEAN  P  Std.  Dev. 

COGNITION  (9)  High  Low  (x2)  High  Low  SCORING 

Competitive  Planning  17.4  16.2  5.6  6.2  R  -  1/2W  +  3 
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III.  Discussion  and  Results 


Table  I  presents  the  mean  and  standard  deviation  scores  for  the 
high  and  low  criterion  groups  in  conjunction  with  certain  chi  square 
probabilities  and  the  formula  by  which  each  test  was  scored. 

A.  The  Spatial  Problem: 

Before  directing  attention  to  the  positive  findings  of  this 
study  it  is  appropriate  to  point  out  the  lack  of  success  in  identifying  a 
differential  predictor  for  a  spatial  ability  at  a  significant  level. 

(The  frequent  lack  of  correspondence  between  parameters  generated  from 
verbal  statements  of  job  requirements  and  those  generated  by  aptitude 
testing  has  plagued  psychologists  for  a  long  time.)  Because  of  the  very 
considerable  emphasis  on  the  importance  of  spatial  ability  by  forecasters 
themselves  when  analyzing  the  performance  of  the  ’'best*'  and  "poorest" 
forecaster  with  whom  they  had  worked,  special  efforts  were  exerted  to 
include  a  variety  of  spatial  tests  in  the  trial  test  battery.  Not  only  do 
forecasters  need  to  visualize  weather  in  three  dimensions  but  they  must 
also  contend  with  associated  acceleration  and  deceleration  trends.  Five 
tests  of  a  three  dimensional  spatial  nature  were  included  in  the  initial 
trial  test  battery  and  scores  were  obtained  for  all  76  officers  on  four  of 
them. 

Since  spatial  tests  formed  part  of  the  test  battery  for  the 
selection  of  air  crew  members  during  WW  II,  it  was  desirable  to  ascertain 
their  representation  within  the  two  criterion  groups.  Twenty-two  of  the 
37  members  of  the  low  criterion  group  were  former  air  crew  members  (pilots, 
navigators,  or  bombardiers)  whereas  but  4  of  the  39  members  of  the  high 
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criterion  group  had  such  rated  military  air-crew  experience.  Accordingly, 
it  is  not  surprising  that  the  only  two  tests  upon  which  average  scores  for 
the  low  criterion  group  equalled  or  exceeded  high  criterion  scores  were 
tests  of  spatial  ability;  specifically,  Civil  Service  Cubes  and  G-Z 
Spatial  Orientation  respectively.  The  failure  to  find  a  significant 
difference  between  good  and  poor  forecasters  for  one  of  the  spatial 
ability  tests  may  be  attributable  to  greater  preselection  among  the  low 
criterion  group  on  spatial  ability.  Hence,  it  is  not  appropriate  to 
conclude  that  a  spatial  ability  is  not  both  germane  and  important  for 
weather  forecasting  but  merely  that  this  particular  study  has  not  demonstrated 
its  differential  criticality.  Even  after  an  item  analysis  of  the  spatial 
ability  tests,  nothing  promising  emerged. 

An  examination  of  the  scores  made  by  the  Air  Force  weather  fore¬ 
casters  on  the  two  dimensional  Minnesota  Paper  Form  Board  reported  by 
Jenkins,  and  their  scores  on  the  G-Z  Spatial  Orientation  and  G-Z  Spatial 
Visualization  in  this  study  indicate  that  a  spatial  ability  may  be  quite 
important  for  weather  forecasting.  It  is  for  the  aforementioned  tests  that 
normative  data  arc  available.  Jenkins  has  reported  as  follows  (3):  "The 
mean  score  on  the  revised  Minnesota  Paper  Form  Beard  when  compared  to  . .  . 
various  male  industrial  groups  falls  from  the  80th  to  the  97th  percentile 
with  a  median  value  at  the  00th  percentile.  Even  compared  to  first  and 
fifth  year  engineering  students  the  percentile  ranks  are  80  and  70 
respectively."  In  this  study  the  mean  score  for  the  76  forecasters 
on  the  Spatial  Orientation  was  24.6  which  falls  at  about  the  65th  percentile 
when  compared  to  G-Z  norms  for  college  men.  For  the  Spatial  Visualization 
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(Form  B)  scores  for  both  10  and  13  minutes  were  obtained.  Table  I 
and  the  appendix  show  only  the  scores  for  thirteen  minutes.  The  mean 
score  for  all  76  weather  forecasters  for  ten  minutes  was  19.1  which 
places  at  the  61st  percentile  on  G-Z  norms  for  college  freshmen.  It  may 
be  worth  noting  that  on  the  G-Z  Spatial  Visualization  that  a  difference 
of  only  .7  occurred  between  the  high  and  low  criterion  groups  for 
10  minutes  whereas  there  was  a  difference  of  1.3  for  13  minutes. 

This  suggests  the  possibility  of  generating  greater  variability  by 
permitting  more  subjects  to  encounter  more  of  the  more  difficult 
items  embracing  three  rotations. 

B.  Positive  Findings:  Potential  Predictors 

Scrutiny  of  Table  I  discloses  significant  differences  between 
the  two  criterion  groups  at  the  3 Z  level  or  better  for  five  of  the 
tests  which  were  administered.  Table  II  presents  the  correlations 
between  nine  of  the  tests  and  the  biserial  correlations  between  the 
tests  and  the  criterion  proficiency  ratings.  Biserial  r  for  the 
predicted  scores  from  a  discriminant  function  analysis*  was  .59. 

Actually  and  alone  can  provide  a  multiple  R  within  the  criterion 
of  .56. 

It  may  be  noted  that,  in  terms  of  Guilford’s  "Structure  of  Intellect", 
three  tests,  namely,  Ship  Destination,  Pertinent  Questions,  and  Word  Matrix 
are  identified  with  Cognition  which  has  been  defined  (1  p.5)  as  "discovery, 
awareness,  rediscovery,  or  recognition  of  information  in  various  forms; 
comprehension  or  understanding".  More  specifically,  these  tests  are 

^Performed  at  Arthur  D.  Little  Inc.  through  the  courtesy  of 

Dr.  Vincent  E.  Guiliano  and  by  Mr.  Joel  E.  Jensen;  y’  »  2.554X  + 

.8983X.  +  1.921X.  +  1.701X,  -  189.  2 

J  ->  6 


9 


TABLE  II 


Correlations  with  Criterion  and  Intercorrelations  of 
Selected  Tests  Employed  in  Trial  AF  Weather 
Forecaster  Test  Battery 


r  Biserial 


X, 

X 

X 

X 

X 

X 

X 

X 

1 

2 

3 

4 

5 

6 

7 

8 

Ship 

Destination 

.52 

Correlate 

Comp  let ion 

.46 

.54 

Minnesota 

Names 

.38 

.44 

.61 

Pertinent 

Questions 

.37 

.35 

.31 

.29 

Word  Matrix 

.33 

.32 

.36 

.28 

.11 

Word  Group 
Naming 

.29 

.  31 

.32 

.38 

.12 

.22 

Logical 

Reasoning 

.25 

.36 

.42 

.14 

.14 

.25 

.48 

General 

Reasoning 

.20 

.41 

.41 

.19 

.04 

.24 

.28 

.45 

Perceptual 

Speed 

.20 

.13 

.30 

.33 

.17 

.12 

.30 

.16 

identified  respectively  with  the  subcategories  of  General  Reasoning, 
Conceptual  Foresight  and  Semantic  Relations. 

The  Correlate  Completion  II  test  is  identified  with  the  Structure 
of  Intellect  category  Convergent  Production.  This  is  defined  by 
Cuilford  and  Merrifield  as  "generation  of  information  from  given 
information,  where  the  emphasis  is  upon  achieving  unique  or  conventionally 
accepted  or  best  outcomes"  (1  p.5) ;  the  more  specific  subcategory 
represented  by  this  test  is  Symbolic  Correlates. 

Although  both  this  and  Jenkins*  study  yielded  significant  findings 
for  the  Minnesota  Names  there  is  a  considerable  disparity  between  the 
overall  mean  of  145.8  which  Jenkins  reported  and  the  overall  mean  of 
125.3  found  in  this  study.  Every  possible  attempt  has  been  made  to  seek 
some  rational  explanation  of  this  difference  but  it  has  been  impossible 
to  secure  identifying  data  for  participants  in  the  former  study;  the 
degree  to  which  the  test  administrations  were  monitored  in  the  former  study 
is  unknown;  Jenkins  has  stated  that  tests  were  administered  by  the 
Air  Weather  Service  (3);  from  his  dissertation  it  appears  from  Appendix  C 
that  the  test  packages  were  mailed  to  the  subjects  themselves  who  were 
requested  to  secure  their  own  monitors  (2) .  When  the  differences  between 
Jenkins  high  and  low  criterion  groups  and  their  standard  deviations 
are  compared  with  those  of  this  study,  the  correspondence  is  considerably 
greater  than  exists  for  the  mean  values .  For  group  one  Jenkins  found  a 
difference  of  16.2  between  the  upper  and  lower  criterion  groups  with 
respective  standard  deviations  of  22.1  and  37.6  (2  p.73);  for  group  two 
the  difference  was  18.2  with  respective  standard  deviations  of  25.0 
and  33.0  (2  p.99).  Although  the  average  age  for  the  group  in  this 
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study  was  greater  by  approximately  14  years  than  for  Jenkins'  group,  such  an 
age  difference  would  not  seem  to  account  for  twenty  and  a  half  points  in 
mean  difference. 

It  is  recommended  that  the  tests  found  as  potential  predictors  in 
this  study  he  administered  to  recently  qualified  Air  Force  weather 
forecasters  and  to  new  forecasters  being  appointed  in  an  effort  to 
provide  validating  evidence  for  their  operational  use. 
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LEGEND  FOR  APPENDIX  A 


a.  Criterion  Score 

b.  Associations  III  —  CN05A 

c.  Competitive  Planning 

d.  Correlation  Completion 

e.  Civil  Service  Cubes 

f.  C-Z  General  Reasoning 

g.  Gestalt  Transformation 

h.  Logical  Reasoning 

i.  Match  Problems  II 

j.  Match  Problems  III 

k  .  Minnesota  Clerical  -  Numbers 

l.  Minnesota  Clerical  -  Names 

m.  Object  -  Aperture 

n.  G-Z  Perceptual  Speed 

o.  Pertinent  Questions 

p.  Seeing  Trends 

q •  Ship  Destination 
r*  Similarities 

s.  Social  Situations 

t.  G-Z  Spatial  Orientation 

u.  G-Z  Spatial  Visualization 

v.  G-Z  Verbal  Comprehension 

w.  Word  Group  Naming 
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APPENDIX  B 


Special  Notes  on  the  Testing 

Test  scores  presented  in  Table  I  and  Appendix  I  for  the  G-Z  Spatial 
Visualization  were  obtained  by  allowing  subjects  13  minutes  rather  than  the 
10  minutes  prescribed  in  the  instructions. 

Test  item  // 13  was  omitted  from  the  DuBois-Gleser  Object-aperture 
Test  Form  B  -  hence,  maximum  possible  score  was  27  rather  than  28. 

Social  Situations  (EP03A)  consisted  of  23  items  (216-238)  - 
total  time  allowed  subjects  was  7li  minutes. 


APPENDIX  C 


Four  tests  were  discontinued  from  the  original  battery.  An  ETS 
architectural  aptitude  test  designed  to  tap  a  spatial  ability  was  deemed 
to  require  too  much  time  for  administration  (40  minutes)  in  terms  of  the 
number  of  items  subjects  were  able  to  complete.  An  ETS  Picture 
Discrimination  Test  concerned  with  perceptual  speed  seemed  to  require 
a  disproportinate  time  to  record  responses.  The  Seeing  Problems  test 
appeared  difficult  to  score  objectively  and  took  too  much  time  to  score. 
The  Expressional  Fluency  simply  was  not  taken  seriously  by  the  Air 
Force  Weather  Forecasters. 
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